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51he degree to vidLcb th^^-ciosea anits of asaljsis are 
likely to prodoce sporioas findings ia sta^ei^, oo^biaatioas of 
multiple liaear regiessioa procediires are Bx4lifaea> 5^ eff eci 
gfoapiag Teuciables (e;.g«, ciassroc»^ scAool^ aad ^hoox ^istx±< 
Procedure sach as Cole^aa's semipartial regressioa aad Haiesk€ 
coMoaaliti&s, ia light of hjpot2ieti<^ sample homcgejieitj aad 
ieterogeaeitj are also illastrated* 2he atilitj of iacladiag gr^apiag 
Tariables ia the regressioa models aa^alteraatires sach as 
collectiag »itaia-gxoaps regressioa resalts ia li-ea of beti^eea^lfroap 
metlTods axe also considered* (iathoiv^S?) 
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Spur'i&js Aggr^azton and t?>e Units of Analysis* ^^^SSSfriSjSST^ 



1^3 educational research sti^dles the principal investigator tskes 
st€^>s » Insure the validity reported findings by designing the study 
prpperl^y and analyzing the data thoroughly, it Is anly vnen failure of 
design and analyses efforts occsrs thaf spurloar flndings/are reported. 
Beyond the reporting stage nay lie the scrutiny of collegues, an audience 

policy nsakers with their own urgent perspectives, a -need for jS^licationV 
and a hope of furtheT uiJiSu s^einJi^a of regprted effects • 

Our sy^pos^il^J today is cqnc^^ed v^Jth ^^e researcii isiplications of 
data aggregation and unit of analysis issu«- Thesjt »pics t«ny 
\titKS are obscured fay overly detafled statistical and snathenatical :nvethod- . 
ologies. Yet .it Is neaningfu/ to characteri2:e their ^Intention jirith^the fanj- 



V 

ilfar. story of the ran seeking to locate his house keys two blocks from 
wh^re they were lost because of -better light v*ere he uas^ looking. In 
this sanie vein let « say that those >^ith ini^ora^tion needs concerning 
State-level policy will not find apt >ropriate answers, zising the pupil as 
the analysis unit 'even though .that is'tnany tisnes the^cc^st conventional 
ace .to look. Siriilarly, one cannot expect to shdd light on the effects 



* Paper presented at annual AERA convention beJd-^rt.San Francisco, Aprir, 

1976. Author's purrent address is: Surte 1137, 733 1 5th Street, 
, Washington, 0-C. 20005 
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Qf a particjlar ifisxructlosal tecnnlqse on t?ie pup^l by analyzifg district- 
level ^gr^ates. 7^ choice of Bjf^^^Siof analysis in aiJ cases 'ctjsi be 
. .directed by the researc^^ goals^^^ by't{^":^::i^ijcnce for the f ladl^igs and 
t th^ir special needs ^ atrf*^ ^methodological considerations as lode- 
pendence of the urji'CS* 

It will now be noted how aggregation and ^nit of analysis phenomena . 
have potential for creating spurious analysis outcor:e|. Sy way of pro- 
ceeding to these'problems. It Is necessary to consider ^afeveral areas-^that 
are of ten thought to -be invelated, and then to re-Intro^6ce analys/s teciinl- 
ques rron» a new point or view* ^-^^ 

• . * Subsarrple Boundary {^e S variable] 

b • . ^ 

adawisg' sz5bsanples of the ixnits of analysis in a data base :ray be done 

several ways and for several reasons. Siibsaniples are often separate^ 

according to identifying featxires such as 'school grade level , instructional 

,^ treaanent and so on. Once such samples are identified it is a simple task 

io devise a rapping variable called G that will serve to define the boundaries.., 

\\ of the siibsaniple. This S. variable is'npthing more 'than a rule for one of 

two actions: (1) foraing aggregates {or averages^ as |s usually the tase), 

and (2) identifying interactions anoqg elements of X and Y.. Defii^ition of. 

interactions through the us^ of 6 is acco^lis^hej by singly using .the niles 

to e^^nd the analysis nodel, or, in generaKlinear sodel terminology/ by T 

> 

expanding the nuaber of predictor variables. The ^T^^z^p^'^r shews several ^ 
effects of sanple boundary variables in routine kna lysis eJ^orpL 1^ light of 

■ ■• V- ' • • • • ■ 
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'hypotbezical parcnt-sanole ifanog«city or hcterogcnJty, 

TraditfonaV Suppressor ^^^eronena 

^ . The acxicn of mdapen^cunt variables adding, to j>red5ction of a depest- 
dent variable through aRparent relationships witJi other I^ependent variables 
bxjz not trroLjgh dJreci relattorrshlps »*ith the dependent variable ^een 
tenned the suppressor effect. Two condtt'oris trust be wt before the lode- 
:>ence-'t variable Js thougftt to be operating a classical or traditional 
(Cohen S Cohen, 1375; Conger, 137^) suppressor ixilt 3 pie' regression 
settings: -{l) the X variable nast be only slight^jy or ^ot at all . 
related to the dependent variable (Y) and (2) this sane X iferiable jnust be 
strongly related to at least one ^ther* X variable* Under ti^ese itond i tlohs 
?niiitiMe prediction ^iU-be increased by a factor that usually exceeds the 
. J>iv3fiate relation between and the X {suppressor) variable. For tSls 
reason, suppressors are welcocne additions to ^nuJtJple regression laodels- 
Seyond this highly specific pattern of bivariate correlations, a traditional 
suppressor is identified by its negative regression weight- Thus, the 
regression 'weight %Aich a suppressor acquires Js positive w?hen It is negative- 
ly correlated with the :^iependent variable, la gther language, the raw 
weight and the standardized (or beta) weight havtf opposite signs* . 

Suppressor phenoKena occur only when an inconsistency exis ts^ betweltn 
the X variable set and the Y variable. Inconsistency arises fron the ranner 
in which subsanples of Y are relateii differentially to X^ The variance 
in Y^say be visualized as beitJg xo^osed of a subset of Y scores bearing a 



positive correlation with selected Xj values a?>d anot?ier si;bset of Y biar- 

ft^ no dWT'eiation thtf rerraJnder of Xj va!^^ in ti?e equation 

t » Xj ^ X^- Tbts latter subset of Y is highly predictibie frorart^ corres- 

ponding X2 values even ^hough |in the vgiot^t Y^3nd X^ are not rel-gted* iti 

this exaifrple, is the soppresVor veriable- -J^^nen all Y values are Ajsed to 

conpiite r (Y, X^), the cor^latioa ft»near zero, but it is possible to^firrf 

a smaller se? of ¥ ^ioh is Jhighly correlated vrit-h a corresponding 5et- 

In order for the siippressor ef/ect to operate, then, Xjand X^ nrjst share 

/• 

these subsanple boundaries. 



Suilfor^ (19^^) and others do not appeal to this^rype of explanation 
of Suppressor effects, although a discussion of variance borders, es^^M^jj^ 



sanple boundary explanation proposed In the present paper. It Is exp3a^ned 

that Xp Jn spite of a correlation virith Y, has sane vari^ce that correlates 

ft * 

near zero with Y. It is because of this variance that the cocreJation 
between Y and X^ is prevented froa'^beiftg even larger. . Kok<, variable X| 
correlates highly >^ith X^ (the suppressor) because diey have conraon that 
variance not sbar^ by Y. Thus, including X^ In the miltiple regression 
sodel peraits this portion of thd Y variance to find a'capable predictor. 
A ^undary explanation is Jiore generally useful than this traditional'Var- 
i^rite explanation of suppressors since it le&ds to the understanding X-Y 



rel^tdnships in a broader nun:ber of analysis sl^tuations. Usefulness of 
this explanation will next be dejponstrated wfth cocnraonallty analysis 
interpretations. ' -•.-v* ' ' - 



Negative Conmonal itaes . . 

Itore recently than SuTKord, VeJdnsa (1S7^) and Keriinger & 

Pedhazur (§973) have called attentii^ to "Negative contributions^' 

inonality analysis results. Ine cxxnnonal f ty procedure examines each X var- 

• * 

iable as If it is the Tasx variable In the dependent variable set to be 
aciei to the regression equati^. Tne net val^^e of X • in Ctie equation is- 
the percent Increase in e^lained Y i/arJance as computed by -subractlng 
the figures of the before-and-af ter' X nodels- The co nrro n allty naine for* 
the procedure arises because patrs of .^^riables, trIpJets, and so on are 
siisilarly treated'as* the last additions to ^h^ 4»de3 so that their conimon 
net value nay be deterrMned- Since rhe output of corxaonailty analysis 
consists of percents of Y variance explained iby both single and joint 
c3ontri but Ions X*s maVe toward* expiaing Y variance, then the negative con- 
tribution 6utcon« Is a slgn34 that Itss thag nothing in Y ha^ been explained. 



Negat I contributions are possible only for corrrnona 1 i ty vaJue% that is for 
^ pairs of X varlabies While they are i^npossible for unique (single X) con- 
- -^y- - tribiations- ^ . ^ 

Besfdes being a sad'^state of arrears aftervreseai^ fends have been 
^ ' * expertdedl^r the study, explanation of less than nothing can also be axlear 
' signal that suppressors are operating In the 'data set- Extending the present 



, interpretation of suppressors to negative contributions, then, suggests 

that they are products of -the action of a boundary variable* Just as there 
. are inconsistencies' in^X^Y reJationshipte which will produce suppressor ' s 
' phenooena, cocnonality analysis will also register th^ rnconsistencies- 8y 
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inconsistency Is meant the Irregular qual I ry of X-Y correlations wh«i Si^b- 
satipies of X and Y are considered* In other Krds tne sarrple Js <*Dt 

0 

horogeneoos with regard to the irjanner in- X a:^. Y relate statJstlcaliy 

iFor exanple one portion of the sanple n;3y be high and positively related on 
X aarf Y^ while another ^n^y t>e negatively related,.. * 

« 

Analysis of Variance 

Tne problem -of saniple heterocenl ry can also be «ftistdered in the 
,cootext of analysis of variance sijuatlorts, where effects are sought as 
indicators of exper i^nental treatment outcorres. Traditional use of the 
teem **relat4onship^^ is irade-^for .observational ^correlational research? 
studies While use of the stena "effects** is usually reserved for experimental 
studies V In either case the S variable is a useful heuristic for the topic 
of spurious aggregation and ti>e units of analysis* , , ' 

Attention to parent san^jle heterogenity in experitn^t^l •school effect 
studies cones fron» the popular ^textbook by €lass S Stanley (1370). Their 
verba-] analysis of independence and the sartpiing units of analysis led^tbeni 
to conclude that degrees of freedoni must, suffer if the research setting^s 
the intact classrooa. A settfng such ^s tiiis is qui^ frequent, in our I'e-- 
^search, and unfoctunately it is rtany tinves aistreated in the statistical 

'analysis wor^c. Glass and Stanley (197P) . tecocttsend that one forta classrooa 
oeans and adjust degrees of freedba accordingly* That is, the researcher 

^should aggregate ^the pupil scores, fora classrooca neans, and adjust df to 
^reflect the nunber of ciassrooos. rather than the nunrber of- pupils. 
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situation, it itey be clear already that the S variable Is t?ie classnooni 

« - - - I 

identlficatiai of each If S!ass & Stanley are enpirically correct, 

. we smst e^ect there to be considerable scbsanple heterogenlty *»«t^i respect 

% 

to for if thj sairpie i? found »to be honogeneoos with respect to S then 
Jaggregation and loss of df is a quest i<^able procedure. 

** « 

because aggregation Tray be cri^estion^le -and becatzse it is often 

dor>e— somet iine^ ^utorsatical 1 y--^-v*#3en analyzing both effects and relation- 

^ snips ttie recosnendatlon' is "feest si^jected routinely to empirical test- 

?oyi>or {I37k) has. pVovidedi[ evidence chat the untested use of aggregate 

i - ^ \ ' ^ ' 

units of analysis -of classropni (and sotnetiiaes, individual unitSiOf analysis) 

^ K ■ ... 

can iead to grievous Type J and H err^rs^ cither unit of analysis can lead 

\ ' 

/ to errors^ so exclusive use of e^t*rer"unit accomplices nothing. J\ pre- 

' \ • . • • " \ . ■ ' ■ ' 

analysis step is required to Identify the proper unit of arialysis. In -an \ 

investigation of the independence or*, analysis units,* Slen^l^ing (1376) 

discusses the use of this pre-analysi^ test, but finds, it to be ^oo^«>n^, * 

'serCrative or tpo liberal a test in selected situations of independent and 

dependent, units. "Raese two studies show the etnpirical test.of a linit of 

anaiysis respect, to sample honogenelty, and the irrportance /of inanage- 

tn^nt control In 'effect studies, for protection of ^he s^lect^ units. ' 

Secause of the great loss of I/jforination that occurs vith aggregation 

(i^oynor, 1975) it shouia~not be done unttl-after it is proven necessary. 

^prelation and Regression • : * - 

Thortidike (1939) called attention to the operation of € in correlational- 
relationship studies fay il Icjstratlng the spurious d^velopaent of a relationsbi 



: ' ; - . 

across higher order aggregations-^ His early e>3mple calls attention to 
real-torW CTnrfestationsrof G. Twelve sd>'ooj districts werejcifeh a^ked 
to provide tvyo data points on each puptU in the district:, 
the pupil's IQ scxjre (X) and the-TOcfcer of rocxas available' in the school 
burlding i^^5e« each-pupil was t^ght (Y). Eadi individual district vras 
<^ite heterogeneous wi^ rts^>^z to these variables. That is, within 
each district there wa^ a broad range of l<is ar>d rooias yet there was no 
correlation betweeq tliese X and Y variables within each district. The 
problen in the exaitpie cornes from heterogeni;ty on theSe X-Y variables 
across districts. When the 12 districts were combined and a correlation 
coefficient was cosiputed, it was .^5, not .00 as It has b^n in each district 
. individtsally. Ko* aggregati<«i had yet taken place. -Once school district.^- 
aggregates («ieans) were used, the resulitng c9rreJation wa^ ^90.^ 



" extensive research work with simulations and literature revi^ by 

the teani ofj^nan and- Burs te in ^ave been done in the correlation and 

regression analysis areas (Hannan,/^971; Burstein, 1975). These studies » • 

provided the iinpetus f or the present syntheses of the effects and relationships 

areas using the G variable concept. This author has sought to apply their 

detailed statistical reyarch findings to popular analysis nodels,. using." ^ 

nontechnical language. -Whei^e the above ai/thors refer tabias and iheffictency 

{or Inaccuracy and" incons^istency) of regression weights, "these terms have 

been <^co 11 acted here under the l^bef "spurious." Usfdg tneir findings to 

predict spurious correlations, it is necessary to -efiq)lqy tiie G variable 

• . *' * - ..f . . • . ■ 

again. Briefly, they *conculde- that spurious relationships will "be products ' 
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of situ^tioos where strongi relationships- exist anong G-0( or G-y/ Recall 



tftat bdth these ^itua|,tons .were present in the Thorndike papeA which 

- . \ * \ ^ 

IS soaewhat classic as an example of confounded or proxy relationships. 
' • * * Measurement of G 

ff 

Befope ending this paper,- let me say' that it is very little comfort 

» 

to know that, suppressors, negative commonalities. Type I and Type il errors 
'and inflated. correlation coefficients tan be explained in terras of a G 
variable used to establish the uni.t of ana^ysis4 Research which is care- 
fully planned and conducted will ratrely be- affected fay this as a nuisance 
at the time of data analysis. StiU,. our_understnading of effects and 
relationships among measures of interest is many times i^isuffici en t ^o 
^control* all the potential contaminants of our findings. 

An ultlmafe solution is believed *to be the measurement of G Jtself, 
for if^G Js viewed as an abstract rule for selecting or fording observa- 
tions into groups prior to analysis, then 6 is truly a potent treatment 
variable. This solution does not refer to the practice of simply including 
dummy variables in a regression equation as indicators <ff. school dis.trict, 
: and school building location of the observation. 

' V ■ . ^ ■ .' ■ ■ ' ■. 

The 'importance of ful ly specified models,, or true starting models as 
they are sometimes cal led,^ is wel 1 known to data analysts. Whil^ the 
practFCfrof using such dummy ji/arjables often" increases the percent of 
exprained criterion variance, it does nothing mp re than acknowledge G 
as potertt. • * ; ^ - . ' . . 
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It is the measarecent of the underlying classrtxjni, school building 
or district differences that will promote our understanding ofG and 
the effects and^rel at ion ships associated with our criterion vari^1es'« 
^ Instead of making $teri]e statements such as "Twenty percent^of the • ^ 
criterion variance was explained by school building differences,-" the 
researcher may someday be^ able to offer richer, more meaningful state- 
ments relating criterion variance to specific features of conditions, 
such as teacher warmth, type of discipl fnarry policy, student bo<iv^ 
coh^siveness, .presence of open classrooms or other substantive learning 
variables. V ' * 
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